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Abstract. We present an optimal data structure for submatrix maximum queries in nx n Monge matri¬ 
ces. Our result is a two-way reduction showing that the problem is equivalent to the classical predecessor 
problem in a universe of polynomial size. This gives a data structure of 0{n) space that answers subma¬ 
trix maximum queries in 0(log log n) time, as well as a matching lower bound, showing that 0(log log n) 
query-time is optimal for any data structure of size 0(n polylog(n)). Our result settles the problem, 
improving on the O(log^n) query-time in SODA’12, and on the O(logn) query-time in ICALP’14. 

In addition, we show that partial Monge matrices can be handled in the same bounds as full Monge 
matrices. In both previous results, partial Monge matrices incurred additional inverse-Ackerman factors. 


1 Introduction 

Data structures for range queries and for predecessor queries are among the most studied data structures in 
computer science. Given an n x n matrix M, a range maximum (also called submatrix maximum) data struc¬ 
ture can report the maximum entry in any query submatrix (a set of consecutive rows and a set of consecutive 
columns) of M. Given a set S' C [0, U) of n integers from a polynomial universe U, a predecessor data struc¬ 
ture can report the predecessor (and successor) in S of any query integer x G [0, U). In this paper, we prove 
that these two seemingly unrelated problems are in fact equivalent when the matrix M is a Monge matrix. 

Range maximum queries. A long line of research over the last three decades including [4,11,12,15,23] 
achieved range maximum data structures of O(n^) space and 0(1) query time^, culminating with the 
0(n^)-space 0(l)-query data structure of Yuan and Atallah [23]. In general matrices, this is optimal since 
representing the input matrix already requires 0{n^) space. In fact, reducing the additional space to 0{n^ jc) 
is known to incur an i?(c) query-time [6] and such tradeoffs can indeed be achieved for any value of c [5,6]. 

However, in many applications, the matrix M is not stored explicitly but any entry of M can be computed 
when needed in 0(1) time. One such case is when the matrix M is sparse, or simply has N = o(n^) nonzero en¬ 
tries. In this case the problem is known in computational geometry as the orthogonal range searching problem 
on the nxn grid. In this case as well, various data structures with 0(Y)-space and 0(l)-query appear in a long 
history of results including [3,8,10,13,15]. For a survey on orthogonal range searching see [21]. Another case 
where the additional space can be made o(n^) (and in fact even 0(n)) is when the matrix is a Monge matrix. 

Range maximum queries in Monge matrices. A matrix M is Monge if for any pair of rows i < j and 
columns k < £ we have that M[i, k] + M[j^£] > M[i^£] + M[jf, k].^ Submatrix maximum queries on Monge 
matrices have various important applications in combinatorial optimization and computational geometry 
such as problems involving distances in the plane, and in problems on convex n-gons. See [7] for a survey 
on Monge matrices and their uses in combinatorial optimization. Submatrix maximum queries on Monge 
matrices are used in algorithms that efficiently find the largest empty rectangle containing a query point, in 
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dynamic distance oracles for planar graphs, and in algorithms for maximum flow in planar graphs. See [17] 
for more details on the history of this problem and its applications. 

Given an n x n Monge matrix M it is possible to obtain compact data structures of only 0{n) space that 
can answer submatrix maximum queries in 0(1) time. The first such data structure was given by Kaplan, 
Mozes, Nussbaum and Sharir [17]. They presented an 0(nlogn)-space data structure with 0(log^ n) query 
time. This was improved in [16] to 0{n) space and O(logn) query time. 

Breakpoints and Partial Monge matrices. Given an m x n Monge matrix M, let r(c) be the row 
containing the maximum element in the c-th column of M. It is easy to verify that the r(-) values are 
monotone, i.e., r(l) < r(2) < ... < r(n). Columns c such that r(c — 1) < r(c) are called the breakpoints 
of M. A Monge matrix consisting of m < n rows has 0{m) breakpoints, which can be found in 0{n) time 
using the SMAWK algorithm [2]. 

Some applications involve partial Monge matrices rather than full Monge matrices. A partial Monge 
matrix is a Monge matrix where some of the entries are undefined, but the defined entries in each row and in 
each column are contiguous. The total number of breakpoints in a partial Monge matrix is still 0{m) [16], and 
they can be found in 0{n'a{n)) time^ using an algorithm of Klawe and Kleitman [18]. This was used in [16,17] 
to extend their solutions to partial Monge matrices at the cost of an additional a{n) factor to the query time.^ 

Our results. In this paper, we fully resolve the submatrix maximum query problem in n x n Monge 
matrices by presenting a data structure of 0{n) space and O(log log n) query time. Consequently, we obtain 
an improved query time for other applications such as finding the largest empty rectangle containing a query 
point. We compliment our upper bound with a matching lower bound, showing that O(loglogn) query-time 
is optimal for any data structure of size 0(n polylog(n)). In fact, implicit in our upper and lower bound is 
an equivalence between the predecessor problem in a universe of polynomial size and the range maximum 
query problem in Monge matrices. The upper bound essentially reduces a submatrix query to a predecessor 
problem, and vice versa, the lower bound reduces the predecessor problem to a submatrix query problem. 

Finally, we extend our result to partial Monge matrices with the exact same bounds (i.e., 0{n) space 
and O(loglogn) query time). Our result is the first to achieve such extension with no overhead. 

Techniques. Let M be an n x n Monge matrix^. Consider a full binary tree T whose leaves are the rows of 
M. Let Mu be the submatrix of M composed of all rows (i.e., leaves) in the subtree of a node u in T. Both 
existing data structures for submatrix maximum queries [16,17] store, for each node in T a data structure 
Du- The goal of Du is to answer submatrix maximum queries for queries that include an arbitrary interval 
of columns and exactly all rows of Mu- This way, an arbitrary query is covered in [16,17] by querying the Du 
structures of 0(log n) canonical nodes of T. An i?(log n) bound is thus inherent for any solution that examines 
the canonical nodes. We overcome this obstacle by designing a stronger data structure Du. Namely, one that 
supports queries that include an arbitrary interval of columns and a prefix of rows or a suffix of rows of Mu. 
This way, an arbitrary query can be covered by just two DuS. The idea behind the new design is to efficiently 
encode the changes in column maxima as we add rows to Mu one by one. Retrieving this information is done 
using weighted ancestor search and range maximum queries on trees. This is a novel use of these techniques. 

For our lower bound, we show that for any set of n integers S C [0,n^) there exists an n x n Monge 
matrix M such that the predecessor of x in S' can be found with submatrix minimum queries on M. The 
predecessor lower bound of PMra§cu and Thorup [22] then implies that 0(npolylog(n)) space requires 
i?(loglogn) query time. We overcome two technical difficulty here: First, M should be Monge. Second, there 
must be an 0(npolylog(n))-size representation of M which can retrieve any entry M[i^j] in 0(1) time. 

Finally, for handling partial Monge matrices, and unlike previous solutions for this case, we do not 
directly adapt the solution for the full Monge case to partial Monge matrices. Instead we decompose the 
partial Monge matrix into many full Monge matrices, that can be preprocessed to be queried cumulatively in 
an efficient way. This requires significant technical work and careful use of the structure of the decomposition. 

® Here a(n) is the inverse-Ackerman function. 

^ In [17], there was also an additional logn factor to the space. 

® We consider m x n matrices, but for simplicity we sometimes state the results for n x n matrices. 
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Roadmap. In Section 2 we present an 0(nlogn)-space data structure for Monge matrices that answers 
submatrix maximum queries in O(log log n) time. In Section 3 we reduce the space to 0{n). Our lower 
bound is given in Section 4, and the extension to partial Monge matrices in Section 5. 

2 Data structure for Monge matrices 

Our goal in this section is to construct, for a given m x n Monge matrix M, a data structure of size 
0(m log n) that answers submatrix maximum queries in O(loglogn) time. In Section 3 we show how to 
reduce the space from O(nlogn) to 0{n) when m = n. We will actually show a stronger result, namely the 
structure allows us to reduce in 0(1) time a submatrix maximum query into 0(1) predecessor queries on a 
set consisting of n integers from a polynomial universe. 

We denote by pred{m^ n) the complexity of a predecessor query on a set of m integers 
from a universe {0, ...,n — 1}. It is well known that there are 0(m) data structures achieving 
pred{m,n) = min{0(logm), O(loglogn)}. 

Recall that a submatrix maximum query returns the maximum over all i G [ioOi] j ^ [joOi] 

for a given io < ii and jo < ji. We start by answering the easier subcolumn maximum queries within these 
space and time bounds. That is, finding the maximum M[i^j] over all i G [ioOi] foi* ^ given io < ii and j. 

We construct a full binary tree T over the rows of M. Every leaf of the tree corresponds to a single row 
of M, and every inner node corresponds to the range of rows in its subtree. To find the maximum M[i,j] 
over alH G [io^h] for a given io ^ ii and j, we first locate the lowest common ancestor (lea) u of the leaves 
corresponding to io and ii in the tree. Then we decompose the query into two parts: one fully within the 
range of rows Mi of the left child of and one fully within the range of rows of the right child of u. The 
former ends at the last row of Mi and the latter starts at the first row of M^. We equip every node with 
two data structures allowing us to answer such simpler subcolumn maximum queries. Because of symmetry 
(if M is Monge, so is M', where M'[i,jf] = M[n + 1 — i,n + 1 — j]) it is enough to show how to answer 
subcolumn maximum queries starting at the first row. 

Lemma 1. Given anmxn Monge matrix M, a data structure of size 0{m) can be constructed in 0{m\ogn) 
time to answer in 0{pred{m,n)) time subeolumn maximum queries starting at the first row of M. 

Proof Consider queries spanning an entire column c of M. To answer such a query, we only need to find 
the corresponding r(c). If we store the breakpoints of M in a predecessor structure, where every breakpoint 
c links to its corresponding value of r(c), a query can be answered with a single predecessor search. More 
precisely, to determine the maximum in the c-th column of M, we locate the largest breakpoint c' < c, 
and then set r(c) = r(c'). Hence we can construct a data structure of size 0{m) to answer entire column 
maximum queries in 0{pred{m,n)) time. 

Let Mi be a Monge matrix consisting of the first i rows of M. By applying the above reasoning to 
every Mi separately, we immediately get a structure of size 0{m?) answering subcolumn maximum queries 
starting at the first row of M in 0{pred{m, n)) time. We want to improve on this by utilizing the dependency 
of the structures constructed for different i’s. Namely it can be observed that the list of breakpoints of 
MiJ^i is a prefix of the list of breakpoints of Mi to which we append at most one new element. In other 
words, if the breakpoints of Mi are stored on a stack, we need to pop zero or more elements and push at 
most one new element to represent the breakpoints of M^+i. Consequently, instead of storing a separate 
list for every M^, we can succinctly describe the content of all stacks with a single tree T on at most m + 1 
nodes. For every i, we store a pointer to a node s{i) G T, such that the ancestors of s{i) (except for the 
root) are exactly the breakpoints of Mi. Whenever we pop an element from the current stack, we move to 
the parent of the current node, and whenever we push an element, we create a new node and make it a 
child of the current node. Initially, the tree consists of just the root. Every node is labelled with a column 
number and by construction these numbers are strictly increasing on any path starting at the root (the root 
is labelled with —oc). Therefore, a predecessor search for j among the breakpoints of Mi reduces to finding 
the leafmost ancestor of s{i) whose label is at most j. This is known as the weighted aneestor problem. 
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Weighted ancestor queries on a tree of size 0{m) are equivalent to predecessor searching on a number of 
sets of 0{m) total size [19]^, achieving the claimed space and query time bounds. 

To finish the proof, we need to bound the construction time. The bottleneck is constructing the tree 
T. Let Cl < C 2 < ... < c/e for some k < i he the breakpoints of Mi. As long as M[i + l,c/c] > M[r{ck)^Ck] 
we decrease k by one, i.e., remove the last breakpoint. This process is repeated 0{m) times in total. 
If /c = 0 we create a new breakpoint ci = 1. If k > 1 and M[i + l,c/c] < M[r(c/c), c/^], we check if 
M[i +1, n] > M[r(c/c), n]. If so, we need to create a new breakpoint. To this end, we need to find the smallest 
j such that M[i+1, jf] > M[r(c/c), j]. This can be done in O(logn) using binary search. Consequently, T can be 
constructed in 0{m logn) time. Then augmenting it with a weighted ancestor structure takes 0{m) time. □ 

We apply Lemma 1 twice to every node of the full version tree T. Once for subcolumn maximum queries 
starting at the first row and once for queries ending at the last row. Since the total size of all structures 
at the same level of the tree is 0(m), the total size of our subcolumn maximum data structure becomes 
0(mlogm), and it can be constructed in 0(mlogmlogn) time to answer queries in 0{pred{m,n)) time. 
Hence we have proved the following. 

Theorem 1. Given an m x n Monge matrix M, a data structure of size 0(m log m) can be constructed in 
0(mlogmlogn) time to answer subcolumn maximum queries in 0{pred{m,n)) time. 

By symmetry (a transpose of a Monge matrix is Monge) we can answer subrow maximum queries (where 
the query is a single row and a range of columns) in 0(pred(n, m)) time. We are now ready to tackle general 
submatrix maximum queries. 

At a high level, the idea is identical to the one used for subcolumn maximum queries: we construct a full 
binary tree T over the rows of M, where every node corresponds to a range of rows. To find maximum 
M[i,j] over all i G [io^ii] and j G [jo, ji] for a given i^ < ii and jo < ji, we locate the lowest common 
ancestor of the leaves corresponding to io and ii and decompose the query into two parts, the former ending 
at the last row of Mi and the latter starting at the first row of M^. Every node is equipped with two data 
structures allowing us to answer submatrix maximum queries starting at the first row or ending at the last 
row. As before, it is enough to show how to answer submatrix maximum queries starting at the first row. 

Lemma 2. Given an m x n Monge matrix M, and a data structure that answers subrow maximum 
queries on M in 0{pred{n,m)) time, one can construct in 0(m logm) time a data structure consuming 
0{m) additional space, that answers submatrix maximum queries starting at the first row of M in 
0{pred{m,n)pred{n,m)) time. 

Proof. We extend the proof of Lemma 1. Let ci < C2 < ... < Ck he the breakpoints of M stored in a 
predecessor structure. For every i > 2 we precompute and store the value 

TOi = max M[r{ci-i),j]. 

je[ci-i,Ci) 

These values are augmented with a (one dimensional) range maximum query data structure. To begin with, 
consider a submatrix maximum query starting at the first row of M and ending at the last row of M, i.e., 
we need to calculate the maximum M[i,j] over all i G [l,m] and j G [jo, ji]- We find in 0{pred{m,n)) the 
successor of jo, denoted q, and the predecessor of ji, denoted q/. There are three possibilities: 

1. The maximum is reached for j G [jo,Q), 

2. The maximum is reached for j G [q,q/), 

3. The maximum is reached for j G [q/, ji). 

The first and the third possibilities can be calculated with subrow maximum queries in 0{pred{n,m)), 
because both ranges span an interval of columns and a single row. The second possibility can be calculated 
with a range maximum query on the range {i,i']. Consequently, we can construct a data structure of size 
0{m) to answer such submatrix maximum queries in 0{pred{m,n) pred{n,m)) time. 

^ Technically, the reduction adds 0(log* m) to the query time, but this can be avoided. 
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The above solution can be generalized to queries that start at the first row of M but do not necessarily 
end at the last row of M. This is done by considering the Monge matrices Mi consisting of the first i rows 
of M. For every such matrix, we need a predecessor structure storing all of its breakpoints, and additionally 
a range maximum structure over their associated values. Hence now we need to construct a similar tree T 
as in Lemma 1 on 0{m) nodes, but now every node has both a weight and a value. The weight of a node 
is the column number of the corresponding breakpoint c/^, and the value is its rrik (or undefined if k = 1). 
As in Lemma 1, the breakpoints of Mi are exactly the ancestors of the node s(i). Note that every rrik is 
defined in terms of Ck-i and c/e, but this is not a problem because the predecessor of a breakpoint does 
not change during the whole construction. We maintain a weighted ancestor structure using the weights (in 
order to find q and q/ in 0{pred{m^n)) time), and a generalized range maximum structure using the values. 
A generalized range maximum structure of a tree T, given two query nodes u and u, returns the maximum 
value on the unique u-to-v path in T. It can be implemented in 0{m) space and 0(1) query time after 
0(m log m) preprocessing [12] once we have the values. The values can be computed with subrow maximum 
queries in 0{m • pred{n,m)) = 0(m logm) total time. □ 

By applying Lemma 2 twice to every node of the full binary tree T, we construct in O(mlog^m) time 
a data structure of size 0(m log m) to answer submatrix maximum queries in 0{pred{m,n) pred{n,m)) 
time. In order to apply Lemma 2 to a node of T we need a subrow maximum query data structure for the 
corresponding rows of the matrix M. Note, however, that a single subrow maximum query data structure 
for M can be used for all nodes of T. 

Theorem 2. Given an m x n Monge matrix M, and a data structure answering subrow maximum queries 
on M in 0{pred{n^m)) time, one can construct in O(mlog^m) time a data structure taking 0(mlogm) 
additional space, that answers submatrix maximum queries on M in 0{pred{m,n) pred{n,m)) time. 

By combining Theorem 1 with Theorem 2, given an n x n Monge matrix M, a data structure of size 
0(n logn) can be constructed in O(nlog^n) time to answer submatrix maximum queries in 0{pred{n,n)) 
time. 

3 Obtaining linear space 

In this section we show how to decrease the space of the data structure presented in Section 2 to be linear. 
We extend the idea developed in our previous paper [16]. The previous linear space solution was based on 
partitioning the matrix M into n/x matrices Mi, M 2 , • •., M^, where each Mi is a slice of M consisting 
of X = logn consecutive rows. Then, instead of working with the matrix M, we worked with the (n/x) x n 
matrix M', where M'[i,j] is the maximum entry in the j-th column of Mi. 

Subcolumn queries. Consider a subcolumn query. Suppose the query is entirely contained in some Mi. 
This means it spans less than x = logn rows. In [16], since the desired query time was O(logn), a query 
simply inspected all elements of the subcolumn. In our case however, since the desired query time is only 
O(loglogn), we apply the above partitioning scheme twice. We explain this now. 

We start with the following lemma, that provides an efficient data structure for queries consisting of a sin¬ 
gle column and all rows in rectangular matrices. The statement of the lemma was taken almost verbatim from 
the previous solution [16]. Its query time was originally stated in terms of query to a predecessor structure, but 
here we prefer to directly plug in the bounds implied by atomic heaps [14] (which support predecessor searches 
in constant time provided x is O(logn)). This requires only an additional 0{n) time and space preprocessing. 

Lemma 3 ([16]). Given an x x n Monge matrix, a data structure of size 0{x) can be constructed in 
0{x\ogn) time to answer entire-column maximum queries in 0(1) time, if x = O(logn). 

Our new subcolumn data structure is summarized in the following theorem. It uses the above lemma and 
two applications of the partitioning scheme. 
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Theorem 3. Given an m x n Monge matrix M, a data structure of size 0{m) can be constructed in 
0{m\ogn) time to answer subcolumn maximum queries in 0(loglog(n + m)) time. 

Proof. We first partition M into n/x matrices Mi, M 2 ,..., M^jx^ where x = logm. Every Mi is a slice of 
M consisting of x consecutive rows. Next, we partition every Mi into x/x' matrices M^^i, M ^^25 • • •, 
where x' = log logm. Every Mij is a slice of Mi consisting of x' consecutive rows (without loss of generality, 
assume that x divides m and x' divides x). Now we define a new {m/x) x n matrix M', where M'[i, j] is the 
maximum entry in the j-th column of Mi. Similarly, for every Mi we define a new {x/x') x n matrix M^', 
where M/[j, k] is the maximum entry in the k-th column of Mij. 

We apply Lemma 3 on every Mi and Mij in 0(m log n) total time and 0(m) total space, so that any 
M'[i^j] or Mllf^k] can be retrieved 0(1) time. Eurthermore, it can be easily verified that M' and all M^'s 
are also Monge. To prove this, it is enough to argue that if A/" is an 4 x 2 Monge matrix, the 2x2 matrix 
N' created by partitioning N into two slices consisting of two rows and computing the maximum in every 
column of every slice is also Monge. To this end, we need to compare: 

N'[l, 1] + A^'[2, 2] = max(A^[l, 1],N[2, 1]) + max(Ar[3, 2], A^[4, 2]) 

and 

N'[l, 2] + A^'[2,1] = max(Ar[3, l],N[4, 1]) + max(A^[l, 2], A^[2,1]). 

Let max(A^[l, 2], A^[2, 2]) = A^[i,2], where i G {1,2}, and similarly max(A^[3,1], A^[4,1]) = A^[i',l], where 
i' G {3,4}. Then 

(7V'[1,1] + 7V'[2, 2]) - (7V'[1, 2] + 7V'[2,1]) > {N[i', 1] + N[i, 2]) - (7V[i', 1] + N[i, 2]) 
which is at least 0 because of N being Monge. 

Therefore, because M' and all M[ are all Monge, and by Lemma 3 their entries can be ac¬ 
cessed in 0(1) time, we can apply Theorem 1 on M' and every M[. The total construction time is 
0((m/x) log(m/x) logn + {m/x){x/x')\og{x/x')\ogn) = O(mlogn), and the total size of all structures 
constructed so far is 0{{m/x)\og{m/x) + {m/x){x/x')\og{x/x')) = 0{m). 

Now consider a subcolumn maximum query. If the range of rows is fully within a single Mij, the query 
can be answered naively in 0{x') = O(loglogm) time. Otherwise, if the range of rows is fully within a single 
Mi, the query can be decomposed into a prefix fully within some Mij^ an infix corresponding to a range of 
rows in M/, and a suffix fully within some Mijf. The maximum in the prefix and the sufhx can be computed 
naively in 0{x') = O(loglogm) time, and the maximum in the infix can be computed in O(loglogn) using 
the structure constructed for M-. Einally, if the range of rows starts inside some Mi and ends inside another 
Mif^ the query can be decomposed into two queries fully within Mi and Mif ^ respectively, which can be 
processed in O(log log n) time as explained before, and an infix corresponding to a range of rows of M' . The 
maximum in the infix can be computed in O (log log n) time using the structure constructed for Mb □ 

Submatrix queries. We are ready to present the final version of our data structure. It is based on two 
applications of the partitioning scheme, and an additional trick of transposing the matrix. 

Theorem 4. Given an n x n Monge matrix M, a data structure of size 0{n) can be constructed in 
O(nlogn) time to answer submatrix maximum queries in O (log log n) time. 

Proof. We partition M as described in the proof of Theorem 3, i.e., M is partitioned into n/x matrices 
Ml, M 2 ,..., Mnjx^ where x = logn, and every Mi is then partitioned into x/x' matrices M^q, Mi^ 2 , • • •, 
where x' = log logn. Then we define smaller Monge matrices M' and M^', and provide 0(1) time access to 
their entries with Lemma 3. We apply Theorem 3 to the transpose of M' to get a subrow maximum query data 
structure for M' . This takes 0(n) space and 0(n log n) time. With this data structure we can apply Theorem 2 
on M', which takes an additional 0(|^^ log j^^) = 0(n) space and O(nlogn) time. We would have liked 
to apply Theorem 3 to the transpose of all M/ as well, but this would require 0(n) space for each matrix, 
which we cannot afford. Since we do not have subrow maximum query data structure for the M^'s, we cannot 
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apply Theorem 2 to them directly. However, note that the subrow maximum query data structure is used in 
Theorem 2 in two ways (see the proof of Lemma 2). The first use is in directly finding the subrow maximum 
in cases 1 and 3 in the proof of Lemma 2. In the absence of the subrow structure, we can still report the two 
rows containing the candidate maximum, although not the maximum itself. The second use is in computing 
the values for the generalized range maximum structure required to handle case 2 in that proof. In this case, 
we do not really need the fast query of the data structure of Theorem 3, and can use instead the slower 
linear space data structure from [16, Lemma 2] to compute the values in O(nlogn) time. Thus, we can apply 
Theorem 2 to each and get at most two candidate rows of M- (from cases 1 and 3), and one candidate 
entry of M- (from case 2), with the guarantee that the submatrix maximum is among these candidates. 

We repeat the above preprocessing on the transpose of M. Now consider a submatrix maximum query. 
If the range of rows starts inside some Mi and ends inside another M^/, the query can be decomposed into 
two queries fully within Mi and Mif ^ respectively, and an infix corresponding to a range of rows of M'. 
The maximum in the infix can be computed in O(log log n) time using the structure constructed for M'. 
Consequently, it is enough to show how to answer a query in O (log log n) time when the range of rows is fully 
within a single Mi. In such case, if the range of rows starts inside some Mi^j and ends inside another Mijf, 
the query can be decomposed into a prefix fully within Mij^ an infix corresponding to a range of rows in M- 
and a suffix fully within some Mijf. As we explained above, even though we cannot locate the maximum in 
the infix exactly, we can isolate at most 2 rows (plus a single entry) of M/, such that the maximum lies in 
one of these rows. Each row of M/ corresponds to a range of rows fully inside some Mij. Consequently, we 
reduced the query in O(log log n) time to a constant number of queries such that the range of rows in each 
query is fully within a single Mij. Since each Mij consists of O(log log n) rows of M, we have identified, in 
O(loglogn) time, a set of O(loglogn) rows of M that contain the desired submatrix maximum. 

Now we repeat the same procedure on the transpose of M to identify a set of O (log log n) columns of 
M that contain the desired submatrix maximum. Since a submatrix of a Monge matrix is also Monge, the 
submatrix of M corresponding to these sets of candidate rows and columns is an O (log log n) x O (log log n) 
Monge matrix. By running the SMAWK algorithm [2] in O(loglogn) time on this small Monge matrix, we 
can finally determine the answer. □ 

4 Lower Bound 

A predecessor structure stores a set of n integers S C [0, t/), so that given x we can determine the largest 
y G S such that y < x. As shown by PMra§cu and Thorup [22], for U = any predecessor structure 
consisting of 0(npolylog(n)) words needs i?(loglogn) time to answer queries, assuming that the word size 
is 0(logn). We will use their result to prove that our structure is in fact optimal. 

Given a set of n integers S C [0,n^) we want to construct n x n Monge matrix M such that the 
predecessor of any x in S can be found using one submatrix minimum query on M and 0(1) additional 
time (to decide which query to ask and then return the final answer). Then, assuming that for any n x n 
Monge matrix there exists a data structure of size 0(n polylog(n)) answering submatrix minimum queries 
in o(loglogn) time, we can construct a predecessor structure of size 0(npolylog(n)) answering queries in 
o(loglogn) time, which is not possible. The technical difficulty here is twofolds. First, M should be Monge. 
Second, we are working in the indexing model, i.e., the data structure for submatrix minimum queries can 
access the matrix. Therefore, for the lower bound to carry over, M should have the following property: there 
is a data structure of size 0(n polylog(n)) which retrieves any M[i^j] in 0(1) time. Guaranteeing that both 
properties hold simultaneously is not trivial. 

Before we proceed, let us comment on the condition S C [0,n^). While quadratic universe is enough to 
invoke the i7(loglogn) lower bound for structures of size 0(npolylog(n)), our reduction actually implies 
that even for larger polynomially bounded universes, i.e., S C for any fixed c, it is possible to 

construct n x n Monge matrix M such that the predecessor of x in S' can be found with 0(1) submatrix 
minimum queries on M and 0(1) additional time (and, as previously, any M[i,j] can be retrieved in 0(1) 
time with a structure of size 0(n)). This is a consequence of the following lemma. 
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Lemma 4. For any c, predecessor queries on a set of n integers S C [0,72*^) can he reduced in 0(1) time to 
0(1) predecessor queries on a set of n integers S' C [0,n^) with a structure of size 0{n). 

Proof We explain the reduction for c = 4. Larger c are processed similarly. 

Let S = {xi,X 2 ,... ,^n}- We represent every Xi in base as Xi = yi • F Zi^ where yi^Zi G [0,n^). 
We create a new set S' C [0,n^) storing all yiS and ZiS. Let rank(x) denotes the rank of x in S'. We create 
another set S" C [0,n^) storing elements of the form rank(^i) • n + rank( 2 ;i). It can be seen that finding the 
predecessor of x in S' can be solved by first representing x = y • n‘^ z, computing rank(^) and rank( 2 :), 
and finally locating the predecessor of rank(^) • n + rank( 2 :) in S". Consequently, a predecessor query on S 
can be reduced into two predecessor queries on S' and S", respectively. S' and S" can be combined into a 
single set S'" C [0,2n^), such that predecessor queries in either of them can be answered with predecessor 
queries on S'", by simply shifting every element of S" by Finally, the size of S'", which is up to 3n right 
now, can be reduced to n by storing every third element. Knowing the predecessor of x among these chosen 
elements allows us to find the true predecessor in 0(1) time by inspecting at most three elements stored 
explicitly for every element of the reduced S'". □ 


The following propositions are easy to verify: 


Proposition 1. A matrix M is Monge iff M[i,j] + M[i + 1, j + 1] < M[i + l,j] + M[i,j + 1] for all i,j 
such that all these entries are defined. 

Proposition 2. If a matrix M is Monge, then for any vector H the matrix M', where M'[i,j] = 
M[i,j] + H[j] for all i,j, is also Monge. 

Theorem 5. For any set of n integers S C [0,n^), there exists a data structure of size 0{n) returning any 
M[i,j] in 0(1) time, where M is a Monge matrix such that the predecessor of x can be found using 0(1) 
time and one submatrix minimum query on M. 

Proof. We partition the universe [0, n^) into n parts [0, n), [n, 2n), .... The i-th part [i • n,{i -\-l) • n) defines 
a Monge matrix Mi consisting of |5' fl [i • n, (i + 1) • n)| rows and n columns. The idea is to encode the 
predecessor of x G [0,n^) by the minimum element in the {x mod n + l)-th column of We first 

describe how these matrices are defined, and then show how to stack them together. 

Consider any < i < n. Every element in S' fl [i • n, (i + 1) • n) = {ai, a 2 ,..., a/c} has a unique 
corresponding row in M^. Let Oj = i • n F a'j, so that a'^ < a '2 < ... < a'j^ and a'- G [0, n) for all j, and also 
define a'j^_^^ = n. We describe an incremental construction of Mi. For technical reasons, we start with an 
artificial top row containing l,2,3,...,n. Then we add the rows corresponding to a'^, a' 2 ,..., a'j^. The row 
corresponding to a' consists of three parts. The middle part starts at the (a'- + l)-th column, ends at the 
a'-^^-th column, and contains only Fs. The elements in the left part decrease by 1 and end with 2 at the 
a' -th column, similarly the elements in the right part (if any) start with 2 at the + l)-th column and 

increase by 1. Formally, the k-th element of the {j + l)-th row, denoted Mi[j F l,k], is defined as follows. 


^i[j + I5 ^] — ^ 


a'j — k F 2 


k-i 


b+i 


if k e [1, a'j] 

if k e [a'j + l,a'-+i] 

if k e [o'j^i + l,n] 


( 1 ) 


Finally, we end with an artificial bottom row containing n,n — l,... ,1. See the upper part of Figure 1 for 
an example. We need to argue that every Mi is Monge. By Proposition 1, it is enough to consider every pair of 
adjacent rows ri, r 2 there. Define [j] = ri [j] —ri [j — 1] and similarly r '2 [j] = r 2 [j] —r 2 [j — 1]. To prove that Mi 
is Monge, it is enough to argue that r' 2 [j] > r'i[j] for all j > 2. By construction, both and r '2 are of the form 
— 1, —1,..., —1, 0,0,..., 0,1,1,..., 1, and all O’s in r '2 are on the right of all O’s in r^. Therefore, Mi is Monge. 

Now one can observe that the predecessor of x G [0, n^) can be found by looking at the {x mod n F l)-th 
column of . We check if x < ai, and if so return the predecessor of ai in the whole S. This can 

be done in 0(1) time and 0{n) additional space by explicitly storing ai and its predecessor for every i. 
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Otherwise we know that the predecessor of x is aj such that x mod n G [a'-, and, by construction, we 

only need to find j G [1, k] such that the {x mod n + l)-th element of row j + 1 in is 1. This is exactly 
a sub column minimum query. 

We cannot simply concatenate all M^’s to form a larger Monge matrix. We use Proposition 2 instead. 
Initially, we set M = Mq. Then we consider every other Mi one-by-one maintaining invariant that the current 
M is Monge and its last row is n, n — 1,..., 1. In every step we add the vector H = [—n + 1, —n + 3,. .., n — 1] 
to the current matrix M, obtaining a matrix M' whose last row is 1,2, ...,n. By Proposition 2, M' is 
Monge. Then we can construct the new M by appending Mi without its first row to M'. Because the first 
row of Mi is also 1,2,... ,n, the new M is also Monge. Furthermore, because we add the same value to 
all elements in the same column of M^, answering subcolumn minimum queries on Mi can be done with 
subcolumn minimum queries on the final M. The lower part of Figure 1 depicts the final Monge matrix M. 

We need to argue that elements of M can be accessed in 0(1) using a data structure of size 0(1). To 
retrieve M[jf,/c], first we lookup in 0(1) time the appropriate Mi from which it originates. This can be 
preprocessed and stored for every j in 0(n) total space and allows us to reduce the question to retrieving 
Mi[j',k]. Because Proposition 2 is applied exactly n — 1 — i times after appending Mi to the current M, 
then we can return Mi[j',k] + (n — 1 — i)H[k]. To find Mi[j',k]^ we just directly use Equation 1, which 
requires only storing a'^, a 2 ,..., in 0(n) total space. □ 


Mo = 


Ml = 


M2 = 


Ms = 


Ma = 
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Me = 
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Fig. 1. Reduction forn = 8 and = {8 • 0 + 3,8 • 2 + 2,8 • 2 + 5,8 • 2 + 6,8 • 5 + 2,8 • 5 + 6,8 • 7 + 1,8 • 7 + 4}. 
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5 Data structure for partial Monge matrices 

Our goal in this section is to extend the solution described in Section 3 to partial Monge matrices. Recall 
that in a partial Monge matrix M, for any i < j and k < the condition M[i,k]^ M[j,i] > M[j, k] 

holds only if all M[i, k], M[j,i], M[j, k] are defined. Not all entries in M are defined, but the defined 

entries in every row and every column are contiguous. Let Si and ti denote the first and last columns 
containing defined entries in the i’th row respectively. We assume that we know the coordinates of at least 
one of the defined entries. This allows us to find all 5^’s and t^’s in 0(n log n) time. 

We begin with noting that subcolumn (and subrow) maximum queries can be implemented on partial 
Monge matrices in the same bounds as full Monge matrices (i.e., the bounds of Theorem 3 and Lemma 3 
also apply to partial Monge matrices). This is because we can implicitly fill appropriate constants instead of 
the undefined entries to turn a partial Monge matrix into a full Monge matrix (see [16] for details). Upon 
subcolumn query (a column c and a range of rows R) we first restrict R to the defined entries in the column c 
and only then query the data structure. For submatrix queries however, this trick only works if the query range 
is entirely defined. In general, it does not work because the defined entries in the query range do not necessarily 
form a submatrix. Handling submatrix queries is therefore more complicated. We describe our solution next. 

We will need the following preliminary lemma, that follows quite easily from the persistent predecessor 
structure of Chan [9]. 

Lemma 5. A collection S of 0{n) weighted points on an n x n grid can he preprocessed in O(nloglogn) 
time and 0{n) space, so that, given any {x,y), the maximum weight of a point {x',y') G S such that x' > x 
and y' > y can he calculated in O(loglogn) time. 

Proof. We use the standard geometric idea of sweeping the grid with a horizontal line while maintaining a 
data structure describing the current situation. The data structure is made partially persistent, so that after 
sweeping given a query (x, y) we can retrieve the version of the structure corresponding to a horizontal line 
passing through {x,y). Querying that version of the data structure will allow us to answer the request. The 
data structure will be a predecessor structure made persistent using the result of Chan [9]. See Theorem 5 
of [20] for a more detailed description of a similar lemma. 

Denote the points by {xi,yi) and their corresponding weights by Wi. We assume that the weight are 
distinct. We sweep the grid with a horizontal line starting at ^ = n. The predecessor structure stores 
x-coordinates of some of the already seen points. Xi is stored in the predecessor structure iff ^ and 
there is no i' such that yi> > y, Xi' > Xi and Wi' > Wi. This is because otherwise the i'-th point is a better 
answer than the i-th point for any query processed using this or any future version of the data structure. 
Consequently, the points whose ^-coordinates are stored in the predecessor structure can be arranged so that 
their ^-coordinates are increasing and the weights decreasing. Then it follows that locating the maximum 
weight of a point {x' ,y') G S such that x' > x and y' > y can be done by finding the successor of x in the 
version of the predecessor structure corresponding to y. Maintaining the structure while sweeping the grid 
is also done with a predecessor search. After having seen a new point {xi,yi) we locate the predecessor of 
Xi. If the weight of the corresponding point is smaller than Wi, we remove it from the structure and repeat. 

A persistent predecessor search structure can be implemented in space 0{n) while keeping the query 
time O(log log n) [9]. Consequently, we can build in O(nloglogn) time a structure of size 0{n) answering 
queries in O(log log n) time. □ 

Before we handle arbitrary partial Monge matrix we describe a solution to a restricted type of partial 
Monge matrix called a staircase matrix. In a staircase matrix, the defined entries in every row either all 
start in the first column, or all end in the last column. We begin with a weaker result, which is that one 
can answer submatrix maximum queries on an n x n staircase matrix in O(log log n) time with a structure 
of size O(nlogn). We will then show how to reduce the space to 0{n), and finally how to use the solution 
for staircase matrices in order to handle arbitrary partial Monge matrices. 

Theorem 6. Given an n x n staircase Monge matrix M, a data structure of size 0(n log n) can he 
constructed in O(nlogn) time to answer suhmatrix maximum queries in O(log log n) time. 
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Proof. Because of left-right symmetry, we can assume that the defined entries in row i start in the first 
column and end in column p. Notice that either ti < t 2 < ... < or ti > ^2 > ... > Without loss 
of generality we will assume the latter. This is enough because we will not be explicitly using the Monge 
property in our solution, except for applying Theorem 2 on a copy of M (called M) where the undefined 
entries are appropriately filled. 

We partition M into full Monge matrices using a standard method: First, create a full Monge matrix by 
taking the upper-left fragment [l,n/2] x [l,t^/ 2 ] of M. Then, recursively decompose the staircase matrices 
created by taking the upper-right fragment [1, n/2] x [t ^/2 + 1, ti] and the lower-left fragment [n/2 + 1, n] x [1, n] 
of M. See Figure 2. It is easy to verify that the decomposition consists of at most 2n full Monge matrices 
(called fragments). The decomposition has other useful properties on which we elaborate further. 



(a) 



(b) 



(c) 


Fig. 2. (a) A staircase nx n Monge matrix partitioned into 2n smaller full Monge matrices (fragments), (b) A query 
range [zoAi] x [jo,ii] decomposed into two full Monge matrices A and B and one dominance query C. (c) The 
dominance query as vertical and horizontal lines (the green fragment is fully inside the range and the blue and red 
fragment intersect the horizontal line). 


Consider a query range [io,M] x [jo,ji]. To find the maximum (defined) over all i G [iidAi] and 

j ^ [io^ii] we proceed as follows. The simple case is when the query range is fully within the defined part 
of M. To handle this case, we apply Theorem 2 on a copy of M (denoted M) where the undefined entries 
are appropriately (and iplicitly) filled. This allows us to do submatrix queries in O(loglogn) time when 
the query range is fully defined. Otherwise, we decompose the query into three parts. The first part, which 
we call a dominance maximum query, is to find the maximum M[i,j] over all i > i' and j > j' , for i',j' to 
be defined shortly. The other two are submatrix maximum queries fully within the defined part of M (and 
hence can be processed by querying the structure built for M in O(loglogn) time). The decomposition is 
performed in 0(1) time by setting / = + 1 and choosing the smallest i' > io such that < ji (which 

can be preprocessed for every possible ji in 0{n) space). The two submatrix maximum queries are therefore 
over the full Monge matrices [ioAi] x [jo, / — 1] and [i{),i' — 1] x Hence, it is enough to focus on 

answering dominance maximum queries. 

To answer a dominance maximum query (i.e., to find the maximum M[i,j] over all i>i' and j > j') we 
use the partition of M into full Monge matrices (fragments). Every such fragment is either fully inside the 
query range, fully outside of the query range, or intersected by the query range boundary. 

Fragments inside the query range. A fragment [ro,ri] x [co,ci] is fully inside the query range iff ro > i' 
and Co > j' . This observation allows us to reduce computing the maximum over all matrices fully inside the 
query to the problem defined in Lemma 5. The reduction is simply that for every fragment [ro,ri] x [co,ci] 
we create a point (ro,co) and set its weight to be the maximum inside the fragment. As a result, we create 
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at most 0{n) points on the n x n grid. Using Theorem 2 on M to create every point separately takes total 
0(n log log n) in the preprocessing time, so in O(nloglogn) time we can construct a structure of size 0{n) 
answering queries in O(log log n) time. 

Fragments intersected by the query range. We are left only with finding the maximum over all 
fragments intersected by the boundary of our dominance maximum query. We partition these fragments 
into three groups. The first consists of the single fragment containing M[i',j']. The maximum there can be 
found with a submatrix maximum query on M in O (log log n) time. All other fragments intersected by the 
boundary are either intersected by the horizontal line y = i' or the vertical line x = j', but not both. We 
show how to find the maximum over all matrices intersected by the horizontal line y = i' and fully to the 
right of the vertical line x = j' (the other case is symmetric). 

By the properties of our decomposition scheme, there are at most logn fragments intersected by any 
horizontal line, and they can be arranged in the natural left-to-right order. For every possible horizontal 
line, we store these at most log n fragments in an array. For every fragment we store the coordinates of its 
corresponding submatrix of M and the maximum in all of its entries below the horizontal line. The array is 
additionally equipped with the maximum over all maxima in each one of its suffixes. Such preprocessed data 
allows us to find the maximum over all fragments intersected by a horizontal line y = i' and fully on the right 
of a vertical line x = j' in O(loglogn) time: First, we binary search over the array stored for y = i' to locate 
the leftmost fragment completely on the right of x = j'. Then we return the stored corresponding maximum. 
Notice that the binary search also allow us to locate the fragment containing M[i',/]. Consequently, the 
whole query time is O(loglogn) using O(nlogn) space for this part of the implementation. To guarantee 
0(nlogn) preprocessing time, we run the SMAWK algorithm on every fragment in the decomposition in 
total O(nlogn) time. This gives us the maximum in every row of every fragment. This is then enough to 
construct all arrays in O(nlogn) time. □ 

We now proceed to improving Theorem 6 so that the structure needs just linear space. The main idea is to 
partition the nxn staircase matrix M into cells of size log n x log n and then define a new smaller (n/ log n) x 
(n/ logn) staircase matrix M' (whose entries correspond to cell-maxima in M) on which we apply Theorem 6. 
To implement this idea we need a number of additional auxiliary data structures, which take 0(n) space in 
total. We start with an auxiliary lemma, which will be used to provide constant-time access to entries of M'. 

Lemma 6. Given an n x n Monge matrix M partitioned into logn x logn eells, a data strueture of size 
0{n) ean be constructed in O(nlogn) time to find the maximum in a given cell in 0(1) time. 

Proof. We partition M into n/ log n horizontal slices, each consisting of log n rows (and all columns). Consider 
a single slice, which is a log nxn Monge matrix. We store its breakpoints ci < C 2 < ... < c/. (where k < log n) 
in an atomic heap, consequently allowing predecessor queries in 0(1) time (this is exactly how the structure 
from Lemma 3 works). Additionally, similarly to Lemma 2, for every i > 2 we precompute the value of 

mi= max M[r{ci-i)J] 

and augment these values with a (one dimensional) range maximum data structure. Here, r(c) denotes the 
row containing the maximum element in the c-th column of the slice in question. Using two predecessor 
queries and one range maximum query, the problem of finding the maximum in a given cell (which is fully 
contained in a single horizontal slice) reduces in 0(1) time to finding the maximum in at most two rows. 
The total space is 0(n/logn • logn) = 0(n) and the bottleneck in the preprocessing is computing the 
breakpoints for all slices. The breakpoints of a single slice can be computed in 0(log^ n) by adding one row 
at a time, as done in the proof of Lemma 1. In total, this takes 0(n/ logn • log^ n) = O(nlogn) total time. 

We repeat the above reasoning on the transpose of M. As a result, we either already know the maximum 
element, or we have isolated at most two rows and at most two columns, such that the maximum lies in 
one of these rows and one of these columns. This gives us at most four candidates for the maximum, which 
can be retrieved and compared naively. □ 
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We are now ready to present our linear-space improvement to Theorem 6. 

Theorem 7. Given an n x n staircase Monge matrix M, a data structure of size 0{n) can he constructed 
in 0(n logn) time to answer submatrix maximum queries in O(loglogn) time. 

Proof. As in the proof of Theorem 6, we can assume that the defined entries in row i start in the first 
column and end in column and that ti > ^2 > ... > 

We partition M into cells of size logn x logn and then define a smaller (n/logn) x (n/logn) staircase 
matrix M'. Notice that, unlike Lemma 6, M is a staircase Monge matrix (and not a full Monge matrix). 
This means that there are three types of cells in M: fully defined, partially defined, and fully undefined. 
An entry of M' is defined iff its corresponding cell in M is fully defined. In this case the entry is equal to 
the maximum in the corresponding cell. The undefined entries of M' are the ones corresponding to either 
partially defined or fully undefined cells of M. We appropriately (and implicitly) fill these entries to turn 
M' into a full Monge matrix, on which we apply Lemma 6. This gives us constant-time access to the entries 
of M', so finally we can apply Theorem 6 to preprocess it in 0{n) space and O(nlogn) time to answer 
submatrix maximum queries in O(loglogn) time. 

Regarding partially defined cells, we observe that there are at most 2n/logn of them. Furthermore, they 
can be arranged in a linear order, so that if the part of M corresponding to the Tth partially defined cell 
is [ri,r-] X [ci,c'], then for all i either [ri,r-] = and c• + 1 = or = r'^_^i + 1 and [q,c'] = 

[q+ 1 , (to be more precise, we might need to declare some fully defined cells partially defined to guarantee 
this property). We create a predecessor structure storing all r^s and a separate predecessor structure storing all 
CiS. We also compute the maximum in every partially defined cell and store them in an array (arranged in the 
aforementioned linear order) augmented with a (one dimensional) range maximum structure. Computing the 
maximum in all partially defined cells is done in 0(n/logn-logn'a(logn)) = 0(n• (a(logn)) time using [18]. 

By the same reasoning given in the proof of Theorem 6, it is enough to implement dominance maximum 
queries on M. A dominance maximum query can be decomposed into (i) a dominance maximum query on 
M', which can be answered in O(loglogn) time, (ii) finding the maximum inside all partially defined cells 
fully within the query range, and (iii) finding the maximum inside partially defined cells intersected by the 
boundaries of the query range. All partially defined cells fully within the query range create a contiguous 
interval in the linear order. The range can be determined in O(loglogn) using the predecessor structures 
storing all r^s and qs, and then the maximum can be found in 0(1) time with a (one dimensional) range 
maximum query. It remains to calculate the maximum inside partially defined cells intersected by the 
boundaries of the query range. We will describe how to process all partially defined cells intersected by the 
horizontal boundary. Handling the vertical boundary is symetric. 

Let the dominance maximum query be specified by (i',/). We want to compute the maximum inside 
the query range and belonging to a partially defined cell intersected by the horizontal line y = i'. All such 
cells create a contiguous interval in the linear order, which can be determined with two predecessor queries 
in O(loglogn) time. In the same complexity, we can find the leftmost such cell u which is not fully on the 
left of the vertical line x = j'. We decompose the original query into a dominance maximum query inside n, 
and the remaining part. The remaining part starts at a left boundary of a partially defined cell and consists 
of the entries at or below y = i' in all partially defined cells to the right of u. Consequently, the answer can 
be preprocessed for every point on a left boundary of a partially defined cell using 0{n/ logn • logn) = 0(n) 
space and 0(n/logn • logn • (a(logn)) = 0(n • (a(logn)) time using [18]. The bottleneck in the preprocessing 
is computing the maximum in every row of every partially defined cell. 

It remains to describe how to handle the dominance query in u. In other words, after construct¬ 
ing in O(nlogn) time an 0(n) size structure, we have, in O(loglogn) time, reduced an arbitrary 
dominance maximum query into a dominance maximum query inside a single partially defined cell. 
This cell is a smaller logn x logn staircase matrix, and furthermore there are at most 2n/logn such 
cells. By recursing on each of these smaller staircase matrices separately, we construct in additional 
0(n/logn • logn log logn) = 0(n log logn) time an 0(n/logn • logn) = 0(n) size structure, which reduces 
the original dominance query, in additional 0(log log log n) time, into a dominance maximum query inside 
one of 0(n/logn • logn/log logn) = 0(n/log logn) tiny log logn x log logn staircase matrix (each of them 
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being a submatrix of the original M). By recursing again on every tiny staircase matrix separately, we 
construct in additional 0(n log log log n) time an 0{n) size structure, which reduces the original arbitrary 
dominance query in additional O(loglogloglogn) time into a dominance maximum query inside an 
(log log log n) X (log log log n) submatrix of M. Such dominance maximum query can be answered naively 
resulting in O(log log n + (log log log n)^) = O(log log n) total query time. □ 

We are now ready to prove the main theorem of this section, which is that using Theorem 7 we 
can actually implement submatrix maximum queries on arbitrary (and not just staircase) partial Monge 
matrices. The idea is to partition the partial Monge matrix into staircase matrices, so that each row 
and each column belong to 0(1) staircase matrices. Such partitioning was used in [1,16] . We build the 
data structure of Theorem 7 on each staircase matrix in the decomposition, and build an additional data 
structure for queries spanning more than one staircase matrix. 

Theorem 8. Given an n x n partial Monge matrix M, a data strueture of size 0{n) can be constructed in 
0(n log n) time to answer submatrix maximum queries in O (log log n) time. 


Fig. 3. A partial Monge matrix where the defined entries are gray and the undefined white. In black is a partitioning 
of the matrix into smaller staircase matrices. Each row is covered by at most two staircase matrices, and each column 
by at most four. 


Proof. We partition M into staircase matrices as depicted in Figure 3. This partition was used in [1,16]. 
We define it here formally for completeness: Let the defined entries in the Tth row start in the s^-th column 
and end in the t^-th column (without loss of generality, every row contains at least one defined entry). The 
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sequence Si is first non-increasing and then non-decreasing. Similarly, the sequence ti is first non-decreasing 
and then non-increasing. Initially, we partition M into three slices. In the first slice, Si is non-increasing 
and ti is non-decreasing. In the second slice, either both Si and U are non-increasing, or both Si and U are 
non-decreasing. Finally, in the third slice Si is non-decreasing and ti is non-increasing. The first and the 
third slices are then further partitioned into two staircase matrices each. The second slice can be broken 
into staircase matrices by dividing along alternating rows and columns as shown in Figure 3. It is easy to 
verify that, in the resulting decomposition, each row is covered by at most two staircase matrices, and each 
column is covered by at most four staircase matrices. Additionally, the staircase matrices contributed by 
the second slice can be partitioned into two collections^ such that any two matrices in the same collection 
are row-disjoint and column-disjoint. 

The data structure consist of the following components. We apply Theorem 7 on every staircase 
matrix in our partition. We also store additional data for both collections. By left-right symmetry, we can 
assume that the ranges of rows and columns of the matrices in the collection are [ri, r 2 ), [r 2 , rs),... and 
[ci, C 2 ), [c 2 , C 3 ),..., respectively. We create a predecessor structure storing all n’s and a separate predecessor 
structure storing all q’s. We also compute and store the maximum inside every staircase matrix in the 
collection (this is done in total 0{n • a{n)) time using the algorithm of Klawe and Kleitman [18]), and 
augment these maxima with a (one dimensional) range maximum structure. 

Now consider a submatrix maximum query [i^^ii] x We first query the 0(1) structures built for 

the staircase matrices corresponding to the first and the third slice of M. Next, we consider each of the two 
collections separately. To find the maximum M[i, j] over all i G [ioOi] j ^ [joOi]? we use the predecessor 
structures to determine in O(loglogn) the following values (without loss of generality, they all exist): 

1 . io such that io G 

2. i'l such that ii G , rj'^_|_i), 

3. jo such that jo G [cj',Cj^+i), 

4. j[ such that ji G [cj;,Cjj+i). 

We then query the structures built for the (io)-th, (z^)-th, (jo)-th, and (ji)-tli staircase matrix in the 
collection. Now either we have already found the maximum, or it belongs to one of the staircase matrices 
fully contained in the query range. Consequently, the maximum can be found in 0(1) time with a single 
(one dimensional) range maximum query. □ 
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